Online data movement without compromising data integrity

ABSTRACT

Embodiments are directed to modifying storage capacity within a data store and to modifying resiliency for a data store. In one scenario, a computer system receives a request to move data. The computer system may determine that data is to be moved from an allocation on one data store to a new allocation on another data store. The computer system may create a new allocation on the other data store, where the new allocation is configured to receive data from the first data store. The computer system then moves the data to the new allocation on the second data store as data I/O requests are received at the first data store. Data store access requests are synchronized with the data movement by directing the data store access requests to the first data store, to the second data store or to both data stores depending on the type of access request.

BACKGROUND

Computing systems have become ubiquitous, ranging from small embeddeddevices to phones and tablets to PCs and backend servers. Each of thesecomputing systems includes some type of data storage and typically, manydifferent types of data storage. For example, a computing system mayinclude solid-state storage and a hard drive or set of hard drives. Thesolid-state storage may be able to handle read and write I/O requestsmore quickly than the hard drive, but may not have the storage capacityof the hard drive. Other media such as tape drives, DVDs (or otheroptical media) or other kinds of media may have different advantages anddisadvantages when reading, writing and storing data.

BRIEF SUMMARY

Embodiments described herein are directed to modifying storage capacitywithin a data store and to modifying resiliency for at least a portionof a data store. In one embodiment, a computer system receives a requestto move data. The request to move data may specify a data store to movethe data off of, a data store to move the data to, or may allow thecomputer system to select where the data is moved from and/or moved to.The computer system may determine that data is to be moved from anallocation on one data store to a new allocation on another data store.The computer system may create a new allocation on the other data store,where the new allocation is configured to receive data from the firstdata store. The computer system then moves the data to the newallocation on the second data store as data I/O requests are received atthe first data store. Data store access requests are synchronized withthe data movement by directing the data store access requests to thefirst data store, to the second data store or to both data storesdepending on the type of access request.

In another embodiment, a computer system modifies resiliency for a datastore. The computer system determines that a resiliency scheme for atleast part of a data store is to be changed from one resiliency schemeto another resiliency scheme, where the data store is configured tostore different portions of data. The computer system determines how thespecified portion of data within the data store is to be alteredaccording to the change in resiliency scheme, and modifies theresiliency scheme of the specified portion of the data store, such thatthe resiliency scheme for the specified portion of the data store ischanged, while the resiliency scheme for other portions of the datastore is not changed.

This Summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This Summary is not intended to identify key features oressential features of the claimed subject matter, nor is it intended tobe used as an aid in determining the scope of the claimed subjectmatter.

Additional features and advantages will be set forth in the descriptionwhich follows, and in part will be apparent to one of ordinary skill inthe art from the description, or may be learned by the practice of theteachings herein. Features and advantages of embodiments describedherein may be realized and obtained by means of the instruments andcombinations particularly pointed out in the appended claims. Featuresof the embodiments described herein will become more fully apparent fromthe following description and appended claims.

BRIEF DESCRIPTION OF THE DRAWINGS

To further clarify the above and other features of the embodimentsdescribed herein, a more particular description will be rendered byreference to the appended drawings. It is appreciated that thesedrawings depict only examples of the embodiments described herein andare therefore not to be considered limiting of its scope. Theembodiments will be described and explained with additional specificityand detail through the use of the accompanying drawings in which:

FIG. 1 illustrates a computer architecture in which embodimentsdescribed herein may operate including modifying storage capacity withina data store.

FIG. 2 illustrates a flowchart of an example method for modifyingstorage capacity within a data store.

FIG. 3 illustrates a flowchart of an example method for modifyingresiliency for at least a portion of a data store.

FIG. 4 illustrates an embodiment in which a resiliency scheme ismodified for at least a portion of data.

FIG. 5 illustrates an embodiment in which storage capacity is added anddata is rebalanced among remaining data storage.

FIG. 6 illustrates an embodiment in which storage capacity is removedand data is rebalanced among remaining data storage.

DETAILED DESCRIPTION

Embodiments described herein are directed to modifying storage capacitywithin a data store and to modifying resiliency for at least a portionof a data store. In one embodiment, a computer system receives a requestto move data. The request to move data may specify a data store to movethe data off of, a data store to move the data to, or may allow thecomputer system to select where the data is moved from and/or moved to.The computer system may determine that data is to be moved from anallocation on one data store to a new allocation on another data store.The computer system may create a new allocation on the other data store,where the new allocation is configured to receive data from the firstdata store. The computer system then moves the data to the newallocation on the second data store as data I/O requests are received atthe first data store. Data store access requests are synchronized withthe data movement by directing the data store access requests to thefirst data store, to the second data store or to both data storesdepending on the type of access request.

In another embodiment, a computer system modifies resiliency for a datastore. The computer system determines that a resiliency scheme for atleast part of a data store is to be changed from one resiliency schemeto another resiliency scheme, where the data store is configured tostore different portions of data. The computer system determines how thespecified portion of data within the data store is to be alteredaccording to the change in resiliency scheme, and modifies theresiliency scheme of the specified portion of the data store, such thatthe resiliency scheme for the specified portion of the data store ischanged, while the resiliency scheme for other portions of the datastore is not changed.

The following discussion now refers to a number of methods and methodacts that may be performed. It should be noted, that although the methodacts may be discussed in a certain order or illustrated in a flow chartas occurring in a particular order, no particular ordering isnecessarily required unless specifically stated, or required because anact is dependent on another act being completed prior to the act beingperformed.

Embodiments described herein may implement various types of computingsystems. These computing systems are now increasingly taking a widevariety of forms. Computing systems may, for example, be handhelddevices such as smartphones or feature phones, appliances, laptopcomputers, wearable devices, desktop computers, mainframes, distributedcomputing systems, or even devices that have not conventionally beenconsidered a computing system. In this description and in the claims,the term “computing system” is defined broadly as including any deviceor system (or combination thereof) that includes at least one physicaland tangible processor, and a physical and tangible memory capable ofhaving thereon computer-executable instructions that may be executed bythe processor. A computing system may be distributed over a networkenvironment and may include multiple constituent computing systems.

As illustrated in FIG. 1, a computing system 101 typically includes atleast one processing unit 102 and memory 103. The memory 103 may bephysical system memory, which may be volatile, non-volatile, or somecombination of the two. The term “memory” may also be used herein torefer to non-volatile mass storage such as physical storage media. Ifthe computing system is distributed, the processing, memory and/orstorage capability may be distributed as well.

As used herein, the term “executable module” or “executable component”can refer to software objects, routines, or methods that may be executedon the computing system. The different components, modules, engines, andservices described herein may be implemented as objects or processesthat execute on the computing system (e.g., as separate threads).

In the description that follows, embodiments are described withreference to acts that are performed by one or more computing systems.If such acts are implemented in software, one or more processors of theassociated computing system that performs the act direct the operationof the computing system in response to having executedcomputer-executable instructions. For example, such computer-executableinstructions may be embodied on one or more computer-readable media thatform a computer program product. An example of such an operationinvolves the manipulation of data. The computer-executable instructions(and the manipulated data) may be stored in the memory 103 of thecomputing system 101. Computing system 101 may also containcommunication channels that allow the computing system 101 tocommunicate with other message processors over a wired or wirelessnetwork.

Embodiments described herein may comprise or utilize a special-purposeor general-purpose computer system that includes computer hardware, suchas, for example, one or more processors and system memory, as discussedin greater detail below. The system memory may be included within theoverall memory 103. The system memory may also be referred to as “mainmemory”, and includes memory locations that are addressable by the atleast one processing unit 102 over a memory bus in which case theaddress location is asserted on the memory bus itself. System memory hasbeen traditionally volatile, but the principles described herein alsoapply in circumstances in which the system memory is partially, or evenfully, non-volatile.

Embodiments within the scope of the present invention also includephysical and other computer-readable media for carrying or storingcomputer-executable instructions and/or data structures. Suchcomputer-readable media can be any available media that can be accessedby a general-purpose or special-purpose computer system.Computer-readable media that store computer-executable instructionsand/or data structures are computer storage media. Computer-readablemedia that carry computer-executable instructions and/or data structuresare transmission media. Thus, by way of example, and not limitation,embodiments of the invention can comprise at least two distinctlydifferent kinds of computer-readable media: computer storage media andtransmission media.

Computer storage media are physical hardware storage media that storecomputer-executable instructions and/or data structures. Physicalhardware storage media include computer hardware, such as RAM, ROM,EEPROM, solid state drives (“SSDs”), flash memory, phase-change memory(“PCM”), optical disk storage, magnetic disk storage or other magneticstorage devices, or any other hardware storage device(s) which can beused to store program code in the form of computer-executableinstructions or data structures, which can be accessed and executed by ageneral-purpose or special-purpose computer system to implement thedisclosed functionality of the invention.

Transmission media can include a network and/or data links which can beused to carry program code in the form of computer-executableinstructions or data structures, and which can be accessed by ageneral-purpose or special-purpose computer system. A “network” isdefined as one or more data links that enable the transport ofelectronic data between computer systems and/or modules and/or otherelectronic devices. When information is transferred or provided over anetwork or another communications connection (either hardwired,wireless, or a combination of hardwired or wireless) to a computersystem, the computer system may view the connection as transmissionmedia. Combinations of the above should also be included within thescope of computer-readable media.

Further, upon reaching various computer system components, program codein the form of computer-executable instructions or data structures canbe transferred automatically from transmission media to computer storagemedia (or vice versa). For example, computer-executable instructions ordata structures received over a network or data link can be buffered inRAM within a network interface module (e.g., a “NIC”), and theneventually transferred to computer system RAM and/or to less volatilecomputer storage media at a computer system. Thus, it should beunderstood that computer storage media can be included in computersystem components that also (or even primarily) utilize transmissionmedia.

Computer-executable instructions comprise, for example, instructions anddata which, when executed at one or more processors, cause ageneral-purpose computer system, special-purpose computer system, orspecial-purpose processing device to perform a certain function or groupof functions. Computer-executable instructions may be, for example,binaries, intermediate format instructions such as assembly language, oreven source code.

Those skilled in the art will appreciate that the principles describedherein may be practiced in network computing environments with manytypes of computer system configurations, including, personal computers,desktop computers, laptop computers, message processors, hand-helddevices, multi-processor systems, microprocessor-based or programmableconsumer electronics, network PCs, minicomputers, mainframe computers,mobile telephones, PDAs, tablets, pagers, routers, switches, and thelike. The invention may also be practiced in distributed systemenvironments where local and remote computer systems, which are linked(either by hardwired data links, wireless data links, or by acombination of hardwired and wireless data links) through a network,both perform tasks. As such, in a distributed system environment, acomputer system may include a plurality of constituent computer systems.In a distributed system environment, program modules may be located inboth local and remote memory storage devices.

Those skilled in the art will also appreciate that the invention may bepracticed in a cloud computing environment. Cloud computing environmentsmay be distributed, although this is not required. When distributed,cloud computing environments may be distributed internationally withinan organization and/or have components possessed across multipleorganizations. In this description and the following claims, “cloudcomputing” is defined as a model for enabling on-demand network accessto a shared pool of configurable computing resources (e.g., networks,servers, storage, applications, and services). The definition of “cloudcomputing” is not limited to any of the other numerous advantages thatcan be obtained from such a model when properly deployed.

Still further, system architectures described herein can include aplurality of independent components that each contribute to thefunctionality of the system as a whole. This modularity allows forincreased flexibility when approaching issues of platform scalabilityand, to this end, provides a variety of advantages. System complexityand growth can be managed more easily through the use of smaller-scaleparts with limited functional scope. Platform fault tolerance isenhanced through the use of these loosely coupled modules. Individualcomponents can be grown incrementally as business needs dictate. Modulardevelopment also translates to decreased time to market for newfunctionality. New functionality can be added or subtracted withoutimpacting the core system.

FIG. 1 illustrates a computer architecture 100 in which at least oneembodiment may be employed. Computer architecture 100 includes computersystem 101. Computer system 101 may be any type of local or distributedcomputer system, including a cloud computing system. The computer system101 includes modules for performing a variety of different functions.For instance, the communications module 104 may be configured tocommunicate with other computing systems. The communications module 104may include any wired or wireless communication means that can receiveand/or transmit data to or from other computing systems. Thecommunications module 104 may be configured to interact with databases,mobile computing devices (such as mobile phones or tablets), embedded orother types of computing systems.

The communications module 104 of computer system 101 may be furtherconfigured to receive requests to move data 105. Such requests may bereceived from applications, from users or from other computer systems.The request to move data 105 may be generated internally to computersystem 101, or may be received from a source external to computer system101. The determining module 106 may determine, based on the receivedrequest to move data 105, that data 113 is to be moved from a first datastore 112 to a second data store 115. The data stores 112 and 113 may belocal to or remote to computer system 101. The data stores may be singlestorage devices, arrays of storage devices or storage networks such asSANs or the cloud. The data stores may store the data 113 according toresiliency schemes. These resiliency schemes may include data mirroringor parity schemes such as data striping, or any other type of resiliencyscheme including the various redundant array of inexpensive disks (RAID)schemes.

In response to the determination that data 113 is to be moved from thefirst data store 112 to the second data store 115, the allocationcreating module 107 of computer system 101 creates a new allocation 116on the second data store 115. The data moving module 108 may then movethe data 113 to the newly created allocation 116 on the second datastore 115. In some embodiments, the data stores 112 and 115 may beonline data stores that are exposed to the internet. In such cases, datais moved between online databases or other data stores. During thisprocess, any data store access requests (such as a request to move data105) may be synchronized with the data movement by directing the datastore access requests to the first data store 112, to the second datastore 115 or to both data stores depending on the type of accessrequest. This process will be described in greater detail below.

As the term is used herein, “online data movement” represents theprocess of moving allocations containing data from one data store (e.g.a set of hard drives or tape drives) to another. This migration of datatakes place without disrupting the functionality or availability of thedata store, and without reducing the number of failures that can betolerated. Additionally, as part of this process, a new set of drivesmay be selected to transition the data storage space to a differentfault domain (e.g. upgrading from being able to tolerate a singleenclosure failure, to being able to tolerate a whole rack failure). Asused herein, the term “fault domain” may refer to an enclosure (e.g.just a bunch of disks or JBOD), a computer (node), a collection of nodesgrouped by a common physical element (e.g. all the blade servers in anenclosure, all the nodes in a rack, or all the nodes behind a specificnetwork switch), or a collection of nodes grouped by a logical element(e.g. an upgrade domain which includes nodes that will be brought downtogether for servicing). The new set of drives may also increase thestorage efficiency of the storage space (i.e. better utilize the drive'scapacity), or improve the performance of the storage space (e.g. spread‘hot’ data (i.e. data that is accessed frequently) across more drives).

Large scale deployments frequently add and remove hardware asrequirements grow and old hardware goes out of warranty. Moreover,workloads may grow and change over time, requiring storage that canadapt to these changes by allowing data to migrate away from drives thathave reached their end of life, migrate onto new hardware, and shiftaround to better utilize the available bandwidth and capacity based onthe workload. This is done in real time without compromising theintegrity or resiliency of data.

In traditional scenarios, data can be shifted as drives are added orremoved; however, the data is typically required to be spread across alldrives in the system equally. For example, many RAID cards supportincreasing the drives in an array by increasing the columns of the RAIDvolume). Also, previous solutions would compromise the integrity of thedata in order to perform movement (e.g. treating a disk to remove datafrom as failed).

Embodiments described herein allow data to be moved between data stores(online or otherwise) based on various criteria including user-definedcriteria. Embodiments further provide the ability to selectively movedata based on external input or other criteria (such as informationabout the heat of data), or internal heuristics (such as moving dataaway from the ends of hard drives to achieve short stroking and thusfaster data access times). Embodiments may further include increasingthe number of copies in a mirror and converting a parity (RAID5/6) toparity with mirroring (RAID5/6+1) dynamically and sparsely (only on thesections that need to be moved), removing a disk from a RAID array bymirroring its contents across the remaining disks to avoid compromisingintegrity, moving data across fault domains to increase the resiliencyof a RAID array to more than its initial creation (e.g. migrating anarray that can lose an enclosure to one that can lose a rack), andconverting a mirror space to a parity space in place (or vice-versa)without rewriting the data.

In some embodiments, data migration is performed by temporarilyconverting simple and mirror spaces to mirrors with more copies. Forthis approach to work on parity, the concept of a RAID5+1 will bedescribed. As the term is used herein, RAID5+1 will include a standardparity layer, which has read, write, and reconstruct capabilities. Readsand writes to the underlying disks will be redirected through a mirrorlayer which has its own read, write, and reconstruct capabilities. Toavoid unnecessary complexity in the parity layer, the mirroring layingwill provide an aggregated view of all the copies holding eachindividual column.

When a data migration is to be performed, a task may be used to createanother allocation as the destination and temporarily increase the datastore's number of copies. This allocation will begin life as stale (i.e.it needs to be reconstructed because it does not contain valid data),and will be picked up and transitioned to healthy by a reconstructiontask. In this manner, data migration is performed at the granularity ofallocation within a data store (instead performing it on everyallocation in the data store). Such embodiments offer advantagesincluding, but not limited to, the following: 1) When migrating multiplecopies of the same column, only one of the copies needs to be read andcan be written to both of the destinations. 2) If a read fails duringmigration, but other copies of data 113 are available, they will beavailable to reconstruct from. 3) The ability to read from any copy ofdata to perform the movement will also increase the ability toparallelize migrations, especially when moving mirrors off of a disk.

In another embodiment, data is migrated between data stores by migratingentire slabs (i.e. collections of allocations that form a resiliencylevel). This process allocates a whole slab, or set of slabs, at thesame offset of a current group of slabs. These new allocations may bemarked as a destination in an object pool configuration. By allowingsets of slabs to be migrated, the slab size can change, as well as anyother resiliency properties. If the source and destinationconfigurations have different slab sizes, then the migration will beperformed on the smallest size which may be divided by both slab sizes(i.e. the least common multiple).

Following the reallocation, a mirror object may be placed above theslabs, forwarding writes to both copies while a task (e.g. areconstruction task) copies data from the old slab(s) to the newdestination slab(s). When this task completes, the old slabs will bediscarded and the new slabs will come in as a separate storage tier (torepresent any changes in resiliency). If the resiliency type of thedestination implements a write-back cache, then a second child space maybe allocated to replace the old one. This allows migration between anytwo resiliency configurations (resiliency type, slab size and faulttolerance can all change).

In another embodiment, whole slabs are migrated with data overlap. Thisis a variant to the embodiment described above, and would migrate at theslab level, but would not allow the size of a slab to change. To stopthe excessive movement of data, only columns which are moving would bereallocated, the remaining columns would be “ghosted” or “no-oped” onthe second (destination) slab. The columns would appear to be there, butwrites to them would be blocked. This moves a minimal amount of data andallows upgrades including enabling resiliency changes.

In yet another embodiment, individual columns may be migrated with RAIDlevel migration. This process may be implemented by two separatemechanisms which work together to provide an end-to-end solution. Thefirst process reallocates individual columns in place. First, a task(such as a pool transaction) creates new allocations and pairs them withsources that are to be moved. Each source and destination are thencombined into a mirror, with the destination being marked as ‘NeedsRegeneration’ or an equivalent marking. These mirrors are then surfacedto the slab as a single allocation, and the regeneration task copies thedata from the source to destination. Upon completion, a task deletes theold allocations and the mirror objects under the slab are replaced bythe new allocations. The second mechanism allows conversion betweenmirror and parity storage spaces. First, the mirroring is separated fromthe striping by making a storage space with a mirror in place of eachallocation. The parity columns are then tacked onto the end and markedas needing regeneration. When this regeneration completes, a second pooltransaction selects one copy from each of the mirrors and surfaces aparity slab.

The conversion from mirror to parity results in an enclosure- orrack-aware parity space, the enclosure-aware parity spaces having thecorrect on-disk format. This process can also be reversed to convertback to a mirror and a similar process can convert between storagespaces such as 2-way mirrors and 3-way mirrors. During this conversion,some data columns may need to be moved to guarantee the ability totolerate higher fault domain failure(s) (as mirror has differentallocation requirements than parity). This migration may be performed asan intermediate step (after parity has been regenerated) to avoidplacing the data store in a state of reduced resiliency. This allowsfine grain control of which allocations move. Moreover, free space isonly required on destination drives, and multiple slabs may be migratedin parallel. These concepts will be explained further below with regardto methods 200 and 300 of FIGS. 2 and 3, respectively.

In view of the systems and architectures described above, methodologiesthat may be implemented in accordance with the disclosed subject matterwill be better appreciated with reference to the flow charts of FIGS. 2and 3. For purposes of simplicity of explanation, the methodologies areshown and described as a series of blocks. However, it should beunderstood and appreciated that the claimed subject matter is notlimited by the order of the blocks, as some blocks may occur indifferent orders and/or concurrently with other blocks from what isdepicted and described herein. Moreover, not all illustrated blocks maybe required to implement the methodologies described hereinafter.

FIG. 2 illustrates a flowchart of a method 200 for modifying storagecapacity within a data store. The method 200 will now be described withfrequent reference to the components and data of environment 100.

Method 200 includes receiving a request to move one or more portions ofdata (210). For example, communications module 104 of computer system101 may receive a request to move data 105 from a request source. Therequest source may be an application, service, user or other computersystem. The request may specify that data 113 is to be moved from onedata store 112 to another data store 115, either or both of which may beonline. The data 113 may be individual files, collections of files,blobs of data or other allocations of data such as slabs, metadata orother types of data or collections of data. The request 105 may specifythe data store to move data off of (e.g. first data store 112 in FIG.1), the data store to move data to (e.g. second data store 115 in FIG.1), or neither (i.e. the request may simply indicate that a certainportion of data is to be moved. If no data store is specified, thecomputer system 101 may determine which data stores have the specifieddata and may further determine which data store(s) the data is to bemoved to. In such cases, the request 105 may include information aboutthe data stores to aid the system in making the decision. The requestmay include multiple data sources and multiple data targets.

Method 200 further includes determining that data is to be moved fromthe first data store to the second data store (220). The determiningmodule 106 of computer system 101 may determine, based on the request tomove data 105, that data 113 is to be moved from the first data store112 to the second data store 115. This determination may includedetermining which data or data stores are being most heavily utilized.As mentioned above, each data store may include a single storage deviceor multiple storage devices. In cases where a data store is an array ofhard drives, some of the hard drives may be being used more than others.Those drives that are constantly being written to may be said to be“hot” or including “hot data”, whereas drives that are not being writtento as often are “cold” or include a greater portion of “cold data.” Thedetermining module 106 may identify which data (among data 113) can bemoved, which data must move and where the data is to be moved to. Insome cases, data cannot be moved and may be labeled “unmovable data.” Ifthe data can move, the determining module 106 may determine the bestlocation for that data.

These determinations may be made based on various factors includingexternal component input. For example, a heat engine may be implementedwhich tracks all reads/writes to data in a given data store. Otherfactors may include heuristics (e.g. move data away from ends of drivesto facilitate short trips for the hard drive data reading tip). Stillother factors may include characteristics of the data store includingfavoring larger drives over smaller drives, favoring the outside of thedrive platter as it is traveling faster and is capable of quicker readsand writes. The determining module 106 may further be configured toidentify where data I/O request bottlenecks are occurring. For example,if multiple applications are trying to write data to a single hard driveor a set of hard drives within the first data store, and the high volumeof data writes to those drives is causing an I/O bottleneck, thedetermining module may determine that existing data on those drives isto be moved to other drives to spread out the I/O requests 111, or thatthe incoming I/O requests are to be redirected to other drives withinthe data store (e.g. by the data redirecting module 109) or to adifferent data store (e.g. the second data store 115).

Method 200 further includes creating a new allocation on the second datastore, the new allocation being configured to receive at least a portionof data from the first data store (230), and moving the data to the newallocation on the second data store as data I/O requests are received atthe first data store, wherein data store access requests aresynchronized with data movement by directing the data store accessrequests to the first data store, the second data store or both datastores depending on the type of access request (240). The allocationcreating module 107 of computer system 101 may create new allocation 116on the second data store 115. This new allocation 116 may be configuredto receive some or all of the data 113 that is moved from the first datastore 112 to the second data store 115.

In some embodiments, the second data store 115 may include at least onehard drive. In such cases, the newly created allocation 16 on the seconddata store 115 may be located substantially near the beginning of thehard drive (i.e. near the outer edge of the hard drive). In this manner,data may be moved away from the ends of hard drives on the first datastore and moved to the beginning of drives on the second data store 115.This allows the data to be accessed more quickly. Other optimizationsmay be used for other data storage devices such as tape drives oroptical drives.

The second data store 115 may be configured to accept new data storagedevices and/or new data storage media. In some embodiments, the seconddata store 115 may include data storage media that was added to thesecond data store. This second data store may be located on a faultdomain that is different from the fault domain of the first data store.For instance, if a fault domain is established for a given hardwarestorage rack (e.g. first data store 112), the storage media may be addedto the second data store 115 which, at least in some embodiments, is ina different fault domain than the first data store. When new media isadded, the existing data may be rebalanced, based on what kind ofhardware was added. Indeed, in some cases, entire racks may be added toexisting data stores. In such cases, the existing data may be rebalancedamong the hardware storage devices of the newly added rack.

When the rebalancing occurs, the data is not necessarily distributedevenly among the different drives. For instance, when hard drives areadded to a data store, some of those hard drives may be differentcapacity drives. In such cases, the full capacity of each hard disk maybe assigned to and be accessible by the second data store. Accordingly,each hard drive or tape drive or other type of block storage such assolid-state drives (SSDs), non-volatile memory express (NVMe), virtualhard disks (VHDs), etc. may be used to its fullest extent, even whenother drives of larger or smaller capacity are present. When data writesare received at the data store, the data writes may be sent to both thefirst and second data stores, and incoming data reads may be sent to thefirst data store until the data of the first data store is copied to thenew allocation on the second data store. In this manner, consistency ismaintained at the data stores, such that incoming writes can be sent toeither data store, while data reads are sent to the older data until thedata is fully copied over to the other (second) data store.

In FIG. 5, a disk array 501 is shown having two hard drives: HD 502A andHD 502B. A new hard drive 502C may be added to the disk array 501 duringoperations. When the new disk 502C is added, the data of the disk arrayis rebalanced using the new disk and any existing disks. The rebalancingmay be performed without compromising any existing resiliencyimplementations on the disk array. For instance, if data mirroring hasbeen implemented, the data in HD 502A may be mirrored between previousdisk 502B and newly added disk 502C. The data may be distributed evenlyamong the disks of the array, or may be distributed in another manner,such as based on the heat of the data or the overall heat of the disk.Here, it should be noted that while two or three disks are shown in FIG.5, the disk array 501, or either of the data stores in FIGS. 1 (112 &115), may include substantially any number of disks, tape drives orother storage devices. Moreover, while a mirroring resiliency scheme isimplemented in FIGS. 5 and 6, it should be noted that any RAID or othertype of mirroring or parity resiliency scheme may be used.

FIG. 6 illustrates an embodiment where at least one hard disk is removedfrom a disk array 601. The disk array 601 may include hard drives HD602B, HD 602C and HD 602D. Hard drive 602C may be removed due to failureof the drive or for some other reason. The disk array 601 now includes602A, 602B and 602D. The data that was on drive 602C is rebalanced amongthe remaining hard drives. As with the embodiment above where a harddrive was added to the disk array, the data may be rebalanced accordingto a variety of different factors, and does not need to be rebalancedevenly over the remaining hard drives. Furthermore, as with the aboveexample, disks may be removed from the array 601 without compromisingexisting resiliency implementations such as mirroring. The data may beautomatically and dynamically distributed among the remaining drives ina manner that does not degrading the resiliency of the disk array 601.The data may be rebalanced according to hot or cold data, such that thehot and cold data are distributed evenly among the remaining drives, ormay be rebalanced to the beginning of each disk. Additionally oralternatively, data may be rebalanced according to the assignedimportance of the data (i.e. the importance of the data may dictate theorder in which the data is rebalanced).

Returning to FIG. 1, in some embodiments, data I/O collisions may beprevented during transition of the data 113 to the new allocation 116 byallowing a first user's data writes take priority over a second user'sdata writes or by allowing a user's data writes to take priority over acomputing system's data writes, or vice versa. As such, when writes arecoming in from multiple different users or applications, the writes maybe prioritized based on user or applications and processed in order ofpriority, such that I/O collisions are avoided. When data has beensuccessfully moved to a new data store (or to a new allocation), anypreviously used allocations on the first data store may be deleted.

The allocations (whether existing or newly added) are implemented withinthe data store to logically define specified areas of storage. Eachallocation identifies where the allocation is located within the datastore, what data it contains and where its data is stored on differentdata storage devices. The allocations may be stored in a mapping table.Whenever storage devices are added to a data store (such as disk array501/601 above) or removed from a data store, the computing system 101may access the mapping table to determine which allocations were storedon the added/removed storage devices. Then, the data stored on theadded/removed drives is rebalanced to one or more other storage devicesof the data store. In some cases, previously used allocations mayinclude a pointer to the newly created allocation on the data store towhich the data is being moved (i.e. the second data store 115), In thismanner, if data is deleted during transition of the data from the firstdata store to the second data store, the newly created allocation isnotified of the deletion, and resiliency is guaranteed throughout thetransition.

Turning now to FIG. 3, a flowchart is illustrated of a method 300 formodifying resiliency for at least a portion of a data store. The method300 will now be described with frequent reference to the components anddata of environment 100.

Method 300 includes determining that a resiliency scheme for at least aspecified portion of a data store is to be changed from a firstresiliency scheme to a second, different resiliency scheme, the datastore including one or more portions of data (310). For example, thedetermining module 106 of computer system 101 may determine thatresiliency scheme 114A for at least some data 113 on the first datastore 112 is to be changed to a second resiliency scheme 114B. Asmentioned above, the resiliency schemes may include mirroring, parity orcombinations thereof (including the various RAID implementations) orother resiliency schemes.

Method 300 next includes determining how the data within the specifiedportion of the data store is to be altered according to the change inresiliency scheme (act 320) and modifying the resiliency scheme of thespecified portion of the data store, such that the resiliency scheme forthe specified portion of the data store is changed, while the resiliencyscheme for other portions of the data store is not changed (330). Thedetermining module 106 of computer system 101 may thus determine how thedata 113 is to be altered according to the change in resiliency scheme(e.g. from mirroring to parity or from parity to mirror). The modifyingmodule 110 of computer system 101 may then modify the resiliency schemefor a certain portion of data, while leaving other portions of datauntouched.

Thus, for example, as shown in FIG. 4, data store 401 has multipledifferent data portions (402A, 402B and 402C). These data portions mayeach be different storage devices (e.g. hard disks) or may be logicalportions of the same hard disk, or a combination of physical and logicaldata portions. Each data portion within the data store may have its ownresiliency scheme: scheme 403A for data portion 402A, scheme 403B fordata portion 402B, and scheme 403C for data portion 402C. Embodimentsherein may modify a portion of a data store (e.g. 402B) and itsresiliency scheme without modifying other portions of the data store ortheir resiliency schemes. Thus, when modifications 404 are made to thedata store portion 402B, a new resiliency scheme 403D may be implementedfor that data portion without affecting any other data portions.

In some cases, a storage device may be added to a data store. At leastone portion of that storage device may be implementing an N-way mirrorresiliency scheme. When the new device is added, an N+1-way mirroringscheme may be implemented for the data store, such that the data storedata is split between two storage devices. The split need not be even,and may be balanced according to heuristics such as relative heat level.Still further, in some case, a storage device may be removed from a datastore. The data that was stored on the removed data storage device maybe rebalanced among the remaining storage devices, without rebalancingexisting data on the remaining storage devices. The granularity of thedata store portions that are to be converted from one resiliency schemeto another may be set to an arbitrary value (1 GB) or may besubstantially any size. In this manner, whole volumes or arrays need notbe converted to change a resiliency scheme. Rather, embodiments hereinmay convert one section of an array or volume from mirroring to parityor vice versa, while leaving the rest of the volume or array alone. Thenif user wants to remove one drive, the system can merelyrebalance/realign the data on that drive or that portion of the datastore.

Accordingly, methods, systems and computer program products are providedwhich modify storage capacity within a data store. Moreover, methods,systems and computer program products are provided which modifyresiliency for at least a portion of a data store.

Claim Support

A computer system is provided including at least one processor. At thecomputer system, a computer-implemented method is provided for modifyingstorage capacity within a data store. The method includes receiving arequest 105 to move one or more portions of data, determining that data113 is to be moved from an allocation on a first data store 112 to a newallocation 116 on the second data store 115, the first and second datastores being configured to store allocations of data, creating the newallocation 116 on the second data store 115, the new allocation beingconfigured to receive at least a portion of data 113 from the first datastore 112, and moving the data 113 to the new allocation 116 on thesecond data store 115 as data I/O requests 111 are received at the firstdata store, wherein data store access requests are synchronized with thedata movement by directing the data store access requests to the firstdata store 112, to the second data store 115 or to both data storesdepending on the type of access request.

In some embodiments, determining that data is to be moved from the firstdata store to the second data store comprises determining which data ordata stores are being most heavily utilized. In some embodiments, thesecond data store comprises at least one hard drive, and wherein the newallocation on the second data store is located nearer to the beginningof the second data store than the allocation on the first data store. Insome embodiments, the second data store comprises a data storage mediathat was added to the computing system, the second data store beinglocated on a fault domain that is different from the fault domain of thefirst data store. In some embodiments, the fault domain comprises ahardware storage rack, such that the second data store comprises datastorage media that was added to hardware storage rack that is differentfrom the hardware storage rack of the first data store.

A computer system is provided including at least one processor. At thecomputer system, a computer-implemented method is provided for modifyingresiliency for at least a portion of a data store. The method includesdetermining that a resiliency scheme 114A for at least a specifiedportion of a data store 112 is to be changed from a first resiliencyscheme 114A to a second, different resiliency scheme 114B, the datastore including one or more portions of data 113, determining how thedata 113 within the specified portion of the data store 112 is to bealtered according to the change in resiliency scheme, and modifying theresiliency scheme 114A of the specified portion of the data store 112,such that the resiliency scheme for the specified portion of the datastore is changed, while the resiliency scheme for other portions of thedata store is not changed.

Some embodiments further include adding a storage device to the datastore, wherein the specified portion of the data store is implementingan N-way mirror resiliency scheme and implementing an N+1-way mirroringscheme for the data store, wherein the data store data is split betweentwo storage devices. Other embodiments further include removing astorage device from the data store and rebalancing the data that wasstored on the removed data storage device among the remaining storagedevices, without rebalancing existing data on the remaining storagedevices.

A computer system comprising the following: one or more processors, areceiver 104 for receiving a request 105 to move one or more portions ofdata off of a first data store 112 and on to a second data store 115, adetermining module 106 for identifying which data 113 is to be movedfrom the first data store to the second data store, an allocationcreating module 107 for creating a new allocation 116 on the second datastore 115, the new allocation being configured to receive at least aportion of data 113 from the first data store 112 and a data movingmodule 108 for moving the data 113 to the new allocation 116 on thesecond data store 115 as data I/O requests 111 are received at the firstdata store, such that data writes are sent to both the first and seconddata stores, and data reads are sent to the first data store 112 untilthe data 113 of the first data store is copied to the new allocation 116on the second data store 115.

Some embodiments further include removing at least one storage devicefrom the data store, accessing the mapping table to determine whichallocations were stored on the removed storage devices and rebalancingthe data of the allocations stored on the removed drive to one or moreother storage devices of the data store. In some embodiments, the seconddata store comprises a plurality of block storage devices, at least twoof which are of different capacity. Some embodiments further includeadding at least one hard disk to the plurality of block storage devicesin the second data store and rebalancing at least a portion of datastored on the first data store among the newly added hard drive and atleast one of the existing plurality of hard disks, the rebalancing beingperformed without compromising existing resiliency implementations onthe second data store.

Some embodiments further include removing at least one hard disk fromthe plurality of hard disks in the first data store and rebalancing atleast a portion of data stored on the first data store among theremaining hard disks of the plurality of hard disks, the rebalancingbeing performed without compromising existing resiliency implementationson the second data store. In some embodiments, data I/O collisions areprevented during transition of the data to the new allocation byallowing a user's data writes take priority over the computing system'sdata writes. In some embodiments, the previously used allocationincludes a pointer to the newly created allocation on the second datastore, such that if data is deleted during transition of the data fromthe first data store to the second data store, the newly createdallocation is notified of the deletion.

The concepts and features described herein may be embodied in otherspecific forms without departing from their spirit or descriptivecharacteristics. The described embodiments are to be considered in allrespects only as illustrative and not restrictive. The scope of thedisclosure is, therefore, indicated by the appended claims rather thanby the foregoing description. All changes which come within the meaningand range of equivalency of the claims are to be embraced within theirscope.

We claim:
 1. At a computer system including at least one processor, a computer-implemented method for modifying storage capacity within a data store, the method comprising: receiving a request to move one or more portions of data; determining that data is to be moved from an allocation on a first data store to a new allocation on the second data store, the first and second data stores being configured to store allocations of data; creating the new allocation on the second data store, the new allocation being configured to receive at least a portion of data from the first data store; and moving the data to the new allocation on the second data store as data I/O requests are received at the first data store, wherein data store access requests are synchronized with the data movement by directing the data store access requests to the first data store, to the second data store or to both data stores depending on the type of access request.
 2. The method of claim 1, wherein determining that data is to be moved from the first data store to the second data store comprises determining which data or data stores are being most heavily utilized.
 3. The method of claim 1, wherein determining that data is to be moved from the first data store to the second data store further comprises determining which data among the stored data is moveable.
 4. The method of claim 1, wherein the second data store comprises at least one hard drive, and wherein the new allocation on the second data store is located nearer to the beginning of the second data store than the allocation on the first data store.
 5. The method of claim 1, wherein the second data store comprises a data storage media that was added to the computing system, the second data store being located on a fault domain that is different from the fault domain of the first data store.
 6. The method of claim 5, wherein the fault domain comprises a hardware storage rack, such that the second data store comprises data storage media that was added to hardware storage rack that is different from the hardware storage rack of the first data store.
 7. The method of claim 1, wherein the second data store comprises a plurality of block storage devices, at least two of which are of different capacity.
 8. The method of claim 7, further comprising: adding at least one hard disk to the plurality of block storage devices in the second data store; and rebalancing at least a portion of data stored on the first data store among the newly added hard drive and at least one of the existing plurality of hard disks, the rebalancing being performed without compromising existing resiliency implementations on the second data store.
 9. The method of claim 7, further comprising: removing at least one hard disk from the plurality of hard disks in the first data store; and rebalancing at least a portion of data stored on the first data store among the remaining hard disks of the plurality of hard disks, the rebalancing being performed without compromising existing resiliency implementations on the second data store.
 10. The method of claim 1, wherein data I/O collisions are prevented during transition of the data to the new allocation by allowing a user's data writes take priority over the computing system's data writes.
 11. The method of claim 1, further comprising deleting one or more previously used allocations on the first data store upon determining that the data contained in the allocation has been moved to the second data store.
 12. The method of claim 11, wherein the previously used allocation includes a pointer to the newly created allocation on the second data store, such that if data is deleted during transition of the data from the first data store to the second data store, the newly created allocation is notified of the deletion.
 13. At a computer system including at least one processor, a computer-implemented method for modifying resiliency for at least a portion of a data store, the method comprising: determining that a resiliency scheme for at least a specified portion of a data store is to be changed from a first resiliency scheme to a second, different resiliency scheme, the data store including one or more portions of data; determining how the data within the specified portion of the data store is to be altered according to the change in resiliency scheme; and modifying the resiliency scheme of the specified portion of the data store, such that the resiliency scheme for the specified portion of the data store is changed, while the resiliency scheme for other portions of the data store is not changed.
 14. The method of claim 13, wherein the resiliency scheme for the specified portion of the data store is changed from mirror to parity or from parity to mirror.
 15. The method of claim 13, further comprising: adding a storage device to the data store, wherein the specified portion of the data store is implementing an N-way mirror resiliency scheme; and implementing an N+1-way mirroring scheme for the data store, wherein the data store data is split between two storage devices.
 16. The method of claim 13, further comprising: removing a storage device from the data store; and rebalancing the data that was stored on the removed data storage device among the remaining storage devices, without rebalancing existing data on the remaining storage devices.
 17. The method of claim 16, wherein allocations are implemented within the data store to logically define specified areas of storage, each allocation identifying where the allocation is located within the data store, what data it contains and where its data is stored on one or more different data storage devices.
 18. A computer system comprising the following: one or more processors; one or more computer-readable storage media having stored thereon computer-executable instructions that, when executed by the one or more processors, cause the computing system to perform a method for modifying storage capacity within a data store, the method comprising the following: receiving a request to move one or more portions of data off of a first data store and on to a second data store; identifying which data is to be moved from the first data store to the second data store; creating a new allocation on the second data store, the new allocation being configured to receive at least a portion of data from the first data store; and moving the data to the new allocation on the second data store as data I/O requests are received at the first data store, such that data writes are sent to both the first and second data stores, and data reads are sent to the first data store until the data of the first data store is copied to the new allocation on the second data store.
 19. The computer system of claim 18, wherein allocations are implemented within the data store to logically define specified areas of storage, each allocation identifying where the allocation is located within the data store, what data it contains and where its data is stored on one or more different data storage devices, the allocations being stored in a mapping table.
 20. The computer system of claim 19, further comprising: removing at least one storage device from the data store; accessing the mapping table to determine which allocations were stored on the removed storage devices; and rebalancing the data of the allocations stored on the removed drive to one or more other storage devices of the data store. 